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Multi-port memory for flexible and space efficient corner 
turning networks in associative processors 



A typical SIMD parallel processor architecture comprises a large capacity external memory, known as the 
Secondary Data Store (SDS) and a Primary Data Store (PDS) - which may be considered to be the local 
data store of a SIMD processor. If the SIMD processor is bit-serial in nature, then the typical organisation 
of the PDS is a single word of PDS memory per SIMD processing element. 

As embodied in Aspex's ASP (Associative String Processor) architecture, the PDS is a dual-ported 
orthogonal memory, providing a comer-turning function whereby the data items for each SIMD processing 
element are presented as a sequence of bit-parallel, word-serial transfers - which are subsequently 
transferred to the SIMD processor(s) in a bit-serial, word-parallel fashion. The invention is an extension to 
Aspex's existing PDS memory architecture, exploiting the full width of the PDS word-serial interface to 
transfer a packed word comprising a number of data items, where an item may be any power-of-two 
multiple of 8-bits (i.e. 8, 16, 32 etc.) up to the full width of the data interface. During a single data load the 
PDS, being stored into a number of adjacent word rows, one item per word row (i.e. per SEMD processing 
element). The invention also supports an extension to bit-serial, word-parallel transfer, whereby the bit- 
serial transfer of the items into the registers of the SIMD processing elements only requires 4 N' instruction 
cycles, where N is the precision of the data items being transferred. These operations are fully orthogonal 
and may apply to data dump as well as data load operations. 

1. Introduction 

The invention is intended for use in digital computers. More specifically, digital computers that have a 
Single Instruction Multiple Data (SIMD) data processor. 

A particular sub-class of SIMD data processors is known as associative processors. Such processors utilise 
a class of memory known as associative or content-addressable memory. Such memories, as the name 
implies, do not operate by addressing the memory location in the conventional sense, but rather they 
compare the stored contents of a pre-defined field (or set of bits) of all memory words with a global bit 
pattern (comprising one or more bits). Those memory words which match the applied pattern during the 
operation (which is variously known as searching, matching or tagging) are marked in some way (tagged) 
in order that they might subsequently participate in a later operation which will, in some manner, modify 
the stored contents of the memory. 

The internal organisation of such memories are generally classified into: 

1 . word organised (i.e. memories whereby a bit parallel pattern may be used as the basis of the search) and 
the bit-parallel comparison is carried out in a single indivisible operation, or 

2. bit-serial (i.e. only a single bit may be used as the basis of the search). 

In the latter class of memories, bit-parallel searches may be emulated by repeated application of bit-serial 
searches. 

In order to facilitate the transfer of information from a conventional memory sub-system (i.e. RAM) into 
the content addressable memory, a class of memory known as orthogonal memory may be employed. "This 
invention will introduce a novel kind of orthogonal memory known as the Primary Data Store (PDS). - 

This has the facility to introduce data into the PDS using conventional transfers (i.e. word sequential) from 
the memory-subsystem (i.e. RAM). In this description such transfers will be classified as secondary 
transfers. 

Subsequently, the data stored in the PDS can be transferred bit-serially (also known as column sequential) 
into the associative memory which makes up the associative processor. 
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2. Description 



2.1. Invention 

The present invention is directed to a novel class of orthogonal memory that is organised to provide 
secondary transfers of multiple items of data, where those items of data are generally - although not 
necessarily - powers-of-2 sub-divisions of the entire secondary word width. 

A specific embodiment of the present invention will allow transfers of: 

• 8 x 8-bit items 

• 4 x 16-bit items 

• 2 x 32-bit items 
« 1 x 64-bit items 

where - in this case - the secondary word width is 64-bits. Such transfers will allow the reading or writing 
of the items described above in a single memory clock cycle. 

The orthogonal memory will be organised so that the items described above will be transferred to (or from) 
designated fields of the PDS memory which are in contiguous - or adjacent - memory words. 

The embodiment employs a encoding scheme for the selection of the fields of PDS which are to be read or 
written, such that the number of word selects wires needing to be accommodated m a grven word row is 
sensibly minimised. 

The embodiment makes use of the shifting word pointer which increments down the word rows in steps 
configured to match the number of data items transferred. The provision of a shifting pomter organised as 
a muKoae Srift register allows the invention to readily support fault tolerance by the automatic or user 
defined bypass of faulty word rows by the provision of a bypass network and appropriate switches Such 
StolSS * would generally - although not necessarily - be accompanied by the provision of spare 
word rows. 

Furthermore, the embodiment provides for the items transferred in the secondary transfer described above 
to be Inferred to the associative memory in a bit-serial (also known as column sequentia ) manner - a 
tr^sfer toown here as a Primary Data Transfer (PDT). This transfer will allow for the reading or writing 
Se corresponding bit of each designated data item between the PDS and the associative memory in one 
associative machine cycle. 

The transfer modes available mimic those of the secondary transfers described above, i.e. 

• D8 

• D16 

• D32 

• D64 

A encoding scheme is used for the selection of the bits of PDS which are to be read or written, such that the 
number of bit select wires needing to be accommodated in a given bit column is sensibly mmimised. 

The embodiment also makes provision for primary transfers to and from the PDS to be masked by the 
introduction of a mask bit per word row. 

The combination of the row and column encoding schemes and selective wiring of the memory bit-cells to 
2 row orSL selects allows a memory cell which is not significantly largei -than a conventional 
orthogonal memory cell, but which supports a variety of data sizes and efficiently packed data transfers. 



2.2. Prior Art 



• Corner turning networks 

• Orthogonal Memories 



2.3. Detailed Description 



2.3.1 . ASProCore Architecture 

In general, one may consider the ASProCore to be an array of Associative Processing Elements (APEs) 
of arbitrary size, with the actual number of physical APEs depending upon the implementation technology. 
The modular-MPC technology - of which ASProCore is a family member - allows the flexible connection 
of modules into parallel processing System-on-Chip (SoC) solutions. The ASProCore standard allows 
multiple cores to be transparently connected via a common ASProCore Global Bus (AGbus) standard, with 
independent IO scalability via the ASProCore Local Bus (ALbus). 



2.3.1.1. ASProCore organisation 

A block diagram of the internal architecture of the ASProCore is given in Figure 1, which shows only 
eight APEs for clarity. The ASProCore has several major features: 

1. a 64-bit wide data register which is accessible in serial, D16 and D32 modes, 

2. a 128-bit wide auxiliary data register which is accessible in serial mode only, 

3 . a 8-bit wide activity register, 

4. an ALU per APE, with an associated carry register (CR), 

5 . three tag registers (TR1 , TR2 and TR3), 

6. two registers denoting the activated state of the APEs (AR1, AR2), 

7. an Inter- APE Communications Channel (ICC), 

8. an associated Primary Data Store (PDS) which is accessible in D8, D16, D32 and D64 modes. 



2.3.2. Primary Data Store (PDS) Behavior 

The internal structure of the PDS is shown in Figure 2. Each word row has: 

• 1 x 64-bit PDS data register 

• 1 -bit SDT row pointer shift register 

• 1 -bit PDT load data register 

• PDT conditional mask logic 

The Secondary Data interface to the PDS is known as the ASProCore Local bus (ALbus). The shift register 
is routed via this bus interface to facilitate modularity. The ends of this shift register (viz. PCI and PCO) 
may be simply chained together to assemble modular IO configurations. 



2.3.2.1. Secondary Data Transfers 

Secondary Data Transfers (SDT) transfer data between the PDS and the ALbus. The PDS SDT pointer 
register in a PE.is used to determine whether the associated PDS data register takes part in a data read or 
write operation. 
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Two methods are provided to initialise the pointer shift register: 

1 . load the pointer register from the PE tag register (TR1) under program control, or 

2 execute a pointer initialisation cycle on the local data interface. 

An ASProCore configured as Start of (I-O) Channel (SOC) will automatically respond to an SDT 
initialisation cycle to initialise a pointer at the head of the memory module. 
Normal secondary transfers comprise: 

1. a write cycle on the local data interface. The SDT pointer shift-register will post-shift following the 
data write, or 

2. a read cycle on the local data interface. The SDT pointer shift-register will post-shift following the data 
read. 

Secondary transfers take place in D8, D16, D32 or D64 modes, which are defined as: 

• D8 transfers are based upon a secondary transfer word comprising eight bytes packed into a 64-bit 
word. The transfers will read/write each byte from/to eight separate APEs and the shift register will 
advance by eight APEs upon completion of the transfer (see Figure 3(a)). 

• D16 transfers are based upon a secondary transfer word comprising four D16 items packed into a 64- 
bit word. The transfers will read/write each D16 item from/to four separate APEs and the shift register 
will advance by four APEs upon completion of the transfer (see Figure 3(b)). 

• D32 transfers are based upon a secondary transfer word comprising two D32 items packed into a 64-bit 
word. The transfers will read/write each D32 item from/to two separate APEs and the shift register 
will advance by two APEs upon completion of the transfer (see figure 3(c)). 

• D64 transfers are based upon a secondary transfer word comprising a single D64 item. The transfers 
will read/write, each D64 item from/to the selected APE and the shift register will advance upon 
completion of the transfer (see Figure 3(c)). 

The illustrations of Figure 3 show the effect of a single secondary transfer. The labelled fields of the 
selected memory words will be transferred to/from the secondary interface in a single clock cycle. 



2.3.2.2. Primary Data Transfers 

A PDT (Primary Data Transfer) operation performs the read and/or write of the PDS (Primary Data Store) 
memory bit-column. 

Data may be transferred between a selected bit of the PDS registers and a selected bit of the. APE data 
registers. This transfer is known as a Primary Data Transfer (PDT). These transfers take place under the 
control of the AGbus. Transfers occur bit-serially and take place in D8, D16, D32 or D64 modes: 

* All D8 items in the PDS words are transferred to/from the APE data registers simultaneously (see 
Figure 4(a)). 

* All D16 items in the PDS words are transferred to/from the APE data registers simultaneously (see 
Figure 4(b)). 

* All D32 items in the PDS words are transferred to/from the APE data registers simultaneously (see 
Figure 4(c)). _ 

* Ail D64 items in the PDS words are transferred to/from the APE data registers simultaneously (see 

Figure 4(d)). 

The iUustrations of Figure 4 show the effect of a sequence of primary transfers. The labelled fields of the 
selected memory words will be transferred to/from the primary interface over multiple clock cycles, where 
each bit of the transfer takes a complete associative processing instruction cycle. 
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2.3.2.2.1. Data Dump 

A data dump involves the transfer of a selected data bit from the APE data registers to the PDS via the 
Tag Register (TR1). 

2.3.2.2.2. Data Load 

A data load involves the transfer of data from the PDS to the APE Data Registers. Data will be 
transferred to the APEs via the activation register (AR1). 

2.3.2.2.3. Data Exchange 

An exchange comprises the transfer of a selected data bit from the APE data registers to the PDS via the 
Tag Register (TR1) with a simultaneous data transfer to the APEs via the activation register (AR1). 

Primary Data Transfers to or from the PDS may be made conditional on the state of TR2 per APE (i.e. 
the read or write may be masked by TR2). 



2.3.3. Primary Data Store (PDS) Organisation 



2.3.3.1. Memory Cell 

The PDS memory cell is shown in Figure 5 and it is based upon a conventional six-transistor static RAM 
cell, where the transistor 501 and 502 provide the path to either: 

• write secondary data from the SDTD (SDT data) and SDTDB (SDT data bar) to the memory cell when 
strobed with SDTRW, or 

• read secondary data from the memory cell onto the SDTD (SDT data) and SDTDB (SDT data bar) bit 
lines when strobed with SDTRW. 

This memory cell is extended by the provision of devices 503, 504, 505 and 506. These transistors provide 
for: 

• write primary data from the PDTD (PDT data) and PDTDB (PDT data bar) to the memory cell when 
strobed with PDTRW and enabled by PDTEN, or 

• read primary data from the memory cell onto the PDTD (PDT data) and PDTDB (PDT data bar) bit 
lines when strobed with SDTRW and enabled by PDTEN. 



2.3.3.2. Secondary Transfer (Word Row) Encoding 

The transfer of secondary data into designated fields of the memory words, in one or more adjacent word 
rows of the memory block is effected by an inventive claim of this patent, namely the combination of an 
encoding of the word select (viz. SDTRW) in Figure 5 into four separate SDTRW lines, namely 
SDTRWJD, SDTRW_C, SDTRWJ3 and SDTRW_A, combined with the wiring of bit cells to these 
named strobes in a particular pattern which repeats itself every eight word rows. 



The pattern is defined as: 



row 
address 
MOD 8 


bit column address 


[63. .56] 


[56..48] 


[47..40] 


[39.32] 


[31. .24] 


[23..16] 


[15..8] 


[7..0] 


n 
\j 


D 


D 


D 


B 


C 


C 


B 


A 


1 


C 


C 


C 


C 


B 


B 
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2 


D 


D 
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C 


C 
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C 


C 
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D 


D 


D 
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D 


D 


D 


A 


C 


C 


B 


B 


5 


C 


C 


A 


C 


B 


B 


D 


D 


6 


D 


A 


B 


B 


C 


C 


C 


C 


7 


A 


B 


C 


C 


D 


D 


D 


D 



where the column index corresponds to the bit column address, and the row index corresponds to the row 
address MODULO 8. An entry with the letters 'A 5 , C B\ 4 C and 'D' implies that memory cell is wired to 
the rwo strobe (viz SDTRW) with the same suffix. 



The generation of the SDTRW_D, SDTRW_C, SDTRW__B and SDTRW_Arows strobes is made 
according to the following logic equations: 



row address MOD 8 


Row Strobe logic expressions 




SDTRW ArO]=RS[0] 


0 


SDTRW BF0]=~D8.RS[0] 


SDTRW C[0] = (D32+D64).RS[0] 




SDTRW D[0] -D64.RS[0] 




SDTRW A[11=(D8+D64).RS[1] 


i 


SDTRW B[11 = (D16+D64).RS[1] 


SDTRW Cril=(D32+D64).RS[l] 




SDTRW D[11=D64.RS[1] 




SDTRW A[2]=~D16.RS[2] 




SDTRW B[2] = (D16+D64).RS[2] 


2 


SDTRW Cr2] = (D32+D64).RS[2] 




SDTRW D[21=D64.RS[2] 




SDTRW A[3]=(D8+D64).RS[3] 




SDTRW B[3] =~D8.RS[3] 


3 


SDTRW CP! = (D32+D64).RS[3] 




SDTRW D[3]=D64.RS[3] 




SDTRW Ar4] = (D8+D64).RS[4] 


4 


SDTRW Br4] =~D8.RSF4] 


SDTRW C[4] = (D32+D64).RS[4] 




SDTRW Dr41=D64.RS[4] 




SDTRW A\5J= ~D16.RS[51 




SDTRW B[5] = (D16+D64).RS[5] 


5 


SDTRW C[5] = (D32+D64).RS[5] 




SDTRW D[51=D64.RS[5] 




SDTRW A[61 - (D8+D64).RS[6] 




L SDTRW Brffl - fD16+D64).RSr6] 


6 


SDTRW C[6]=(D32+D64).RS[6] 




SDTRW D[6]=D64.RS[6] 




SDTRW A[7]=RS[7] 




SDTRW Br7]=~D8.RS[7] 


7 


SDTRW Cr7] = (D32+D64).RSm 




SDTRW Df71 =D64.RS[7] 



6 




where the designator RS[n] is the Row Select input associated with the given word row (i.e. RS[n] is the 
row select for word row n). The row selects are derived according to the network shown in Figure 6. 

The SDT row pointer shift register is organised so that it advances (skips) according to the selected I-O 
mode. For example, transfers in D8 mode will cause the register to advance in steps of 8, i.e. shifting from 
BIT[0] to BIT[8] to BIT[16] etc. Similarly, transfers in D16 mode will cause the register to advance in 
steps of 4, i.e. shifting from BIT[0] to BIT[4] to BIT[8] etc. 



2,3.3.3. Primary Transfer (Bit Column) Encoding 

The transfer of primary data between designated fields of the memory words, to or from a given bit of one 
or more adjacent word rows of the memory block is effected by an inventive claim of this patent, namely 
the combination of an encoding of the column enable select (viz. PDTEN) in Figure 5 into four separate' 
column PDTEN lines, namely PDTEN JD, PDTEN _C, PDTEN J3 and PDTEN _A, combined with the 
wiring of bit cells to these named enables in a particular pattern which repeats itself every eight word rows. 



The pattern is defined as: 



row 
address 
MOD .8 


bit column address 


[63. .5 6] 


[S6..48] 


[47..40] 


[39..32] 


[31. .24] 


[23..16] 


[1S..8] 


[7..0] 


0 
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D 
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C 
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C 
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C 
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B 


D 


D 


6 


D 


A 


B 


B 


C 


C 


C 


C 


7 


A 


B 


C 


C 


D 


D 


. D 


D 



where the column index corresponds to the bit column address, and the row index corresponds to the row 
address MODULO 8. An entry with the letters C A\ 'B\ 'C and 'D' implies that memory cell is wired to 
the column enable (viz PDTEN) with the same suffix. 



The generation of the PDTEN JD, PDTEN _C, PDTEN _B and PDTEN _A column enables is made 
according to the following logic equations: 
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byte address 


UUlUxxUX auuiCoo 


Row Strobe logic expressions 


0 


7..0 


pdtftvj at7 oi=bs[oics[7 01 

X X^ X X_/l > xx [ / . - v7 J JJU L U J ,UU L' " V J 


pnTOfa oi=:~D8 BST01 CSF7 01 


PDTFKT fT7 01 = CD324-D64^ BSrOI CSF7 01 


PDTEN Df7 01 =D64.BSr01.CS[7..Ol 


1 


15..8 


PDTF"Nf ATI 5 81 = fD8+D64") BSni CSri5..81 


PDTFlsr BH5 81=~D8 Bsrn csri5..8i 


PDTF"NT rri5 81 = fD32+D64^ BSfll CSF15..81 


PDTEN D[15..81 =D64. BS[1].CS[15...8] ' 


2 


24..16 


PDTF.M AT74 161=-D16 BS[21 CS [24..161 

X XT/IN r\.yZ*'-r . . JL WJ x-/aw. jjvj^^j.v^u ^-t..j.uj 


PDTFN BF24 161 = GD16 l +D64 > ). BSf21.CS [24.. 161 


PDTEN Cr24 161 = 0D32+D64). BSF21.CS [24.. 161 


PDTEN D[24..161 = D64. BS[21.CS [24.. 16] 


3 


31. .25 


PHTFTsT AH1 251 = (DS+DS^ BS[31 CS [31. .251 


pnTFTM "RH1 751 = — D8 BS[31 CS [31 251 


PDTFTnT PHI 251 = fD32+D64") BSF31 CS [31..251 


PDTEN D[31 251 = D64. BSP1.CS [31..251 


4 


39..32 


PDTFTsT AH1 251 = TD8+D64") BS[41.CS [39. .321 


PPiTFTsT URI 951=~D8 BS[31 CS [39 321 


PDTFN C[31 251 = (D32+D64). BS[41.CS [39..321 


PDTEN D[3 1 ..251 = D64. BS[4].CS [39..32J 


5 


47..40 


PDTEN A[24 161 = ~D16. BS[5l.CS [47. .401 


PDTEN B[24 161 = CD16+D64). BS[51.CS [47..40] 


PDTEN C[24 161 = (D32H-D64). BS[51.CS [47..40] 

X XV X X-/X ^ v^/ 1 . . x uj v^"^ 1 y " L J L. J 


PDTEN Dr24..161 = D64. BSr51.CS [47..40] 


6 


55..48 


PDTEN Ari5..81 = (D8+D64). BS[6].CS[55..48] 


PDTEN Bri5..8] = ~D8. BS[6].CS[55..48] 


PDTEN CH5..8] = (D32+D64). BS.[6].CS[55..48] 


PDTEN Dri5..81 =D64. BSr61.CS[55..48] 
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63..56 


PDTEN Ar7..01=BS[71.CS[63..56] 


PDTEN Br7..0] = ~D8. BSf7"|.CS[63..56] 


PDTEN Cr7..0] = (D32+D64). BSr7].CS[63..56] 


PDTEN DI7..01 = D64. BSr7].CS[63..56] 



The definition of the BS[n] (i.e. Byte Select [n]) in the above equations is further given by the equations: 

BS[0] * -A5.-A4.~A3.D64 + -A4.-A3.D32 + -A3.D16 + D8 
BS[1] -~A5.-A4.A3.D64 + -A4.A3.D32 + A3.D16 +D8 
BS[2] = ~A5.A4.-A3.D64 + A4.-A3.D32 + -A3.D16 + D8 
BS[3] = -A5.A4.A3.D64 + A4.-A3.D32 + A3.D16 + D8 
BS[4] = A5.-A4.-A3.D64 + -A4.-A3.D32 + -A3.D16 + D8 
BS[5] = A5.-A4.A3.D64 + -A4.A3.D32 + A3.D16 + D8 
BS[6] = A5.A4.-A3.D64 + A4.-A3.D32 + -A3.D16 + D8 
BS[7] « A5.A4.A3.D64 + A4.-A3.D32 + A3.D16 + D8 

where A5 9 A4 and A3 represent the corresponding bits of the column address specification. It is obvious 
that the derivation of the particular bit column address within the given byte field (i.e. CS) is derived in the 
normal manner from the lower three bits of the address (i.e. A2 } Al and AO). 
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Figure 1 ASProCore Array Architecture 
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